Classification of Anti-learnable Biological and Synthetic Data
نویسنده
چکیده
We demonstrate a binary classification problem in which standard supervised learning algorithms such as linear and kernel SVM, naive Bayes, ridge regression, k-nearest neighbors, shrunken centroid, multilayer perceptron and decision trees perform in an unusual way. On certain data sets they classify a randomly sampled training subset nearly perfectly, but systematically perform worse than random guessing on cases unseen in training. We demonstrate this phenomenon in classification of a natural data set of cancer genomics microarrays using crossvalidation test. Additionally, we generate a range of synthetic datasets, the outcomes of 0-sum games, for which we analyse this phenomenon in the i.i.d. setting. Furthermore, we propose and evaluate a remedy that yields promising results for classifying such data as well as normal datasets. We simply transform the classifier scores by an additional 1-dimensional linear transformation developed, for instance, to maximize classification accuracy of the outputs of an internal cross-validation on the training set. We also discuss the relevance to other fields such as learning theory, boosting, regularization, sample bias and application of kernels.
منابع مشابه
Need of Systems Approach for Biological Explanation of Anti-learnable Signatures
We present simple formal models explaining unusual properties of several biological classification tasks as follows. For these datasets the whole range of supervised learning techniques generate predictive models which classify independent test samples systematically below the performance of random guessing (hence the name anti-learning). We show that explanation of such “counter-intuitive” sup...
متن کاملThe Antiglycation Ability of Typical Medicinal Plants, Natural and Synthetic Compounds: A Review
Given the prevalence of diabetes and the increasing number of diabetics, it is essential to find medicines to decrease the chronic complications of diabetes. Several studies have demonstrated that chronic hyperglycemia and its complications are directly related to protein glycation. Thus, identifying natural inhibitors to stop glycation of proteins may play a crucial role in managing the chroni...
متن کاملPalarimetric Synthetic Aperture Radar Image Classification using Bag of Visual Words Algorithm
Land cover is defined as the physical material of the surface of the earth, including different vegetation covers, bare soil, water surface, various urban areas, etc. Land cover and its changes are very important and influential on the Earth and life of living organisms, especially human beings. Land cover change monitoring is important for protecting the ecosystem, forests, farmland, open spac...
متن کاملExpression of an Innate Immune Element (Mouse Hepcidin-1) in Baculovirus Expression System and the Comparison of Its Function with Synthetic Human Hepcidin-25
Hepcidin is an innate immune element which decreases the iron absorption from diet and iron releasing from macrophage cell. In contrast to the chemical iron chelators, there has been limited effort applied to the specific use of hepcidin as a new drug for decreasing the iron overload. Hepcidin is produced in different biological systems. For instance, E-coli is used for human hepcidin expressio...
متن کاملExpression of an Innate Immune Element (Mouse Hepcidin-1) in Baculovirus Expression System and the Comparison of Its Function with Synthetic Human Hepcidin-25
Hepcidin is an innate immune element which decreases the iron absorption from diet and iron releasing from macrophage cell. In contrast to the chemical iron chelators, there has been limited effort applied to the specific use of hepcidin as a new drug for decreasing the iron overload. Hepcidin is produced in different biological systems. For instance, E-coli is used for human hepcidin expressio...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007